Abstract
Several triggers have been stipulated for idiopathic bone marrow failure (BMF) syndromes, but they are likely to occur/be pathogenic only in a particular immunogenetic background that may also affect clinical features of the disease. However, the tremendous number of immune factors and their complex combinations hamper investigations using traditional statistical approaches. Here, we introduce the concept of a “geno-immunome,” which combines a high number of various multimodal immune parameters. Our hypothesis is that such a geno-immunome can shape the clinical features of a disease beyond their traditional pathomorphological and somatic genetic boundaries. We further stipulate that this new strategy will help to discern immunologic commonalities between related subsets to inform further clinical and mechanistic studies.
For that purpose, we took advantage of a machine learning (ML) algorithm reliant on unbiased/unsupervised clustering using a binary latent-factor model to generate geno-immunomic clusters (IC). These clusters were then investigated for the distribution of diagnostic features and pathophysiologic similarities. The input included the combinatorial and individual HLA/KIR haplotypes, MICA, TAP polymorphisms, KIR/HLA network interactions, blood group antigens, HLA evolutionary divergence (HED), and many other immunogenetic polymorphisms derived from a spectrum of diseases which may exhibit immunologic relationships. Our cohort combined (n=1306) BMF patients, including MDS (n=771), AA/PNH (n=351), T-LGLL (n=122), and NK-LGLL (n=62).
To characterize and account for the complex, diverse KIR/HLA system, we first conducted a univariate analysis among each of the disease cohorts vs. a metanalytic control cohort derived from various data subsets (e.g. 1000 Genomes Project, internal controls). For instance, PNH patients had a lower frequency of the KIR2DL3 gene (84 vs. 94% in controls p=0.04) while KIR2DS2 gene was significantly enriched in AA (57 vs. 45%, p=0.04), among many other potentially interesting differences found. HLA haplotype analysis showed overrepresentation of DRB1*15 allele (27 vs. 14%, p<0.01) in AA and B*45 in PNH (5 vs. 0.5%, p<0.01), while A*01 and B*44 were rare (8 vs. 16%, p=0.04 & 7 vs. 15%, p=0.04) and appear to disfavor disease evolution. In MDS patients, DRB1*11 was the most significant difference between patients and controls (15 vs. 9%, p<0.01). When IST AA responders vs. non-responders were compared, the 2DL2/C1 and 2DL1/C2 co-occurrence was enriched in responders (25 vs. 6%, p<0.01).
When we used a supervised random forest analysis, the DRB1*115:01, DQB1*06:02, and TAP*02:01 alleles ranked top in distinguishing between diseases. Nevertheless, the accuracy of the models generated by supervised approaches was low, and they did not resolve the tremendous complexity of immunogenetic background. To mitigate these shortcomings, we then utilized an unsupervised approach to further uncover underlying similarities and differences in the immunogenetic profiles, enabling objective, diagnosis-independent clustering within the genotype/patient continuum. Our ML via a binary non-negative matrix factorization algorithm revealed the presence of 17 distinct geno-immunomic clusters across all patients (IC1-IC17). Individual ICs were summarized to minimal signatures. Post hoc analysis showed that ICs spanned multiple disease diagnoses and phenotypes. Furthermore, some ICs were enriched for some disease entities and clinical phenotypes in reverse analysis. For instance, IC4, which was defined by the presence of KIR2DL1/C2, KIR2DL3, DPA1*01:03, 2DL3/C1, and TAP2:01:01, had a heterogenous disease distribution wherein PNH (31%) and T-LGLL (27%) were overrepresented. In contrast, IC11, which was defined by a diverse set of features including DPA1*01:03, C*07:01, B*08:01, and TAP2:01:01, was mostly composed of LGLL and MDS cases. The IC6 cluster (immune-dominant patterns included MICB*004:01, C*07:02, and B*07:02, among others) was comparatively enriched with AA and PNH.
In sum, this is to our knowledge a first comprehensive approach using ML to define common immunologic features between distinct morphologic subtypes and illustrate the immunogenetic overlaps between previously pathophysiologically discrete sub-entities.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal